Linear Discriminant Analysis of Character Sequences Using Occurrences of Words
نویسندگان
چکیده
Classification of character sequences, where the characters come from a finite set, arises in disciplines such as molecular biology and computer science. For discriminant analysis of such character sequences, the Bayes classifier based on Markov models turns out to have class boundaries defined by linear functions of occurrences of words in the sequences. It is shown that for such classifiers based on Markov models with unknown orders, if the orders are estimated from the data using cross-validation, the resulting classifier has Bayes risk consistency under suitable conditions. Even when Markov models are not valid for the data, we develop methods for constructing classifiers based on linear functions of occurrences of words, where the word length is chosen by cross-validation. Such linear classifiers are constructed using ideas of support vector machines, regression depth, and distance weighted discrimination. We show that classifiers with linear class boundaries have certain optimal properties in terms of their asymptotic misclassification probabilities. The performance of these classifiers is demonstrated in various simulated and benchmark data sets.
منابع مشابه
Discrete Meyer Wavelet Transform Features For online Hangul Script Recognition
Online hangul script recognition is important when writers input characters into computer and communication apparatus (such as PDA, Mobile Phone). In this study, a Wavelet Transform Features-based method for performance improvement of online handwritten hangul character recognition is proposed. The main idea is applying the Discrete Wavelet Transform (DWT) spectral analysis to the recognition o...
متن کاملFeature Transformation with Generalized Learning Vector Quantization for Hand-Written Chinese Character Recognition
In this paper, the generalized learning vector quantization (GLVQ) algorithm is applied to design a handwritten Chinese character recognition system. The system proposed herein consists of two modules, feature transformation and recognizer. The feature transformation module is designed to extract discriminative features to enhance the recognition performance. The initial feature transformation ...
متن کاملDLDA-based Iris Recognition from Image Sequences with Various Focus Information
In this paper, we present a new scheme for iris recognition from focus-varying sequences of iris images. Most of the current state-of-the-art iris recognition systems use the highly focused iris images to obtain high accuracy. These systems does not recognize defocused iris images. They also take much focusing time to acquire the high quality images. Unlike the current iris recognition systems,...
متن کاملA prediction distribution of atmospheric pollutants using support vector machines, discriminant analysis and mapping tools (Case study: Tunisia)
Monitoring and controlling air quality parameters form an important subject of atmospheric and environmental research today due to the health impacts caused by the different pollutants present in the urban areas. The support vector machine (SVM), as a supervised learning analysis method, is considered an effective statistical tool for the prediction and analysis of air quality. The work present...
متن کاملA Multi Linear Discriminant Analysis Method Using a Subtraction Criteria
Linear dimension reduction has been used in different application such as image processing and pattern recognition. All these data folds the original data to vectors and project them to an small dimensions. But in some applications such we may face with data that are not vectors such as image data. Folding the multidimensional data to vectors causes curse of dimensionality and mixed the differe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013